14 research outputs found
Linear Systems over Join-Blank Algebras
A central problem of linear algebra is solving linear systems. Regarding
linear systems as equations over general semirings (V,otimes,oplus,0,1) instead
of rings or fields makes traditional approaches impossible. Earlier work shows
that the solution space X(A;w) of the linear system Av = w over the class of
semirings called join-blank algebras is a union of closed intervals (in the
product order) with a common terminal point. In the smaller class of max-blank
algebras, the additional hypothesis that the solution spaces of the 1x1 systems
Av = w are closed intervals implies that X(A;w) is a finite union of closed
intervals. We examine the general case, proving that without this additional
hypothesis, we can still make X(A;w) into a finite union of quasi-intervals
Constructing Adjacency Arrays from Incidence Arrays
Graph construction, a fundamental operation in a data processing pipeline, is
typically done by multiplying the incidence array representations of a graph,
and , to produce an adjacency
array of the graph, , that can be processed with a variety of
algorithms. This paper provides the mathematical criteria to determine if the
product
will have the required structure of the adjacency array of the graph. The
values in the resulting adjacency array are determined by the corresponding
addition and multiplication operations used to perform the
array multiplication. Illustrations of the various results possible from
different and operations are provided using a small
collection of popular music metadata.Comment: 8 pages, 5 figures, accepted to IEEE IPDPS 2017 Workshop on Graph
Algorithm Building Block
Polystore mathematics of relational algebra
Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections
Polystore mathematics of relational algebra
Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections
Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic
Modern network sensors continuously produce enormous quantities of raw data
that are beyond the capacity of human analysts. Cross-correlation of network
sensors increases this challenge by enriching every network event with
additional metadata. These large volumes of enriched network data present
opportunities to statistically characterize network traffic and quickly answer
a key question: "What are the primary cyber characteristics of my network
data?" The Python GraphBLAS and PyD4M analysis frameworks enable anonymized
statistical analysis to be performed quickly and efficiently on very large
network data sets. This approach is tested using billions of anonymized network
data samples from the largest Internet observatory (CAIDA Telescope) and tens
of millions of anonymized records from the largest commercially available
background enrichment capability (GreyNoise). The analysis confirms that most
of the enriched variables follow expected heavy-tail distributions and that a
large fraction of the network traffic is due to a small number of cyber
activities. This information can simplify the cyber analysts' task by enabling
prioritization of cyber activities based on statistical prevalence.Comment: 8 pages, 8 figures, HPE
pPython Performance Study
pPython seeks to provide a parallel capability that provides good speed-up
without sacrificing the ease of programming in Python by implementing
partitioned global array semantics (PGAS) on top of a simple file-based
messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single
program multiple data) model of computation. pPython runs on a single-node
(e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any
combination of heterogeneous systems that support Python, including on a
cluster through a Slurm scheduler interface so that pPython can be executed in
a massively parallel computing environment. It is interesting to see what
performance pPython can achieve compared to the traditional socket-based MPI
communication because of its unique file-based messaging implementation. In
this paper, we present the point-to-point and collective communication
performances of pPython and compare them with those obtained by using mpi4py
with OpenMPI. For large messages, pPython demonstrates comparable performance
as compared to mpi4py.Comment: arXiv admin note: substantial text overlap with arXiv:2208.1490
Deployment of Real-Time Network Traffic Analysis using GraphBLAS Hypersparse Matrices and D4M Associative Arrays
Matrix/array analysis of networks can provide significant insight into their
behavior and aid in their operation and protection. Prior work has demonstrated
the analytic, performance, and compression capabilities of GraphBLAS
(graphblas.org) hypersparse matrices and D4M (d4m.mit.edu) associative arrays
(a mathematical superset of matrices). Obtaining the benefits of these
capabilities requires integrating them into operational systems, which comes
with its own unique challenges. This paper describes two examples of real-time
operational implementations. First, is an operational GraphBLAS implementation
that constructs anonymized hypersparse matrices on a high-bandwidth network
tap. Second, is an operational D4M implementation that analyzes daily cloud
gateway logs. The architectures of these implementations are presented.
Detailed measurements of the resources and the performance are collected and
analyzed. The implementations are capable of meeting their operational
requirements using modest computational resources (a couple of processing
cores). GraphBLAS is well-suited for low-level analysis of high-bandwidth
connections with relatively structured network data. D4M is well-suited for
higher-level analysis of more unstructured data. This work demonstrates that
these technologies can be implemented in operational settings.Comment: Accepted to IEEE HPEC, 8 pages, 8 figures, 1 table, 69 references.
arXiv admin note: text overlap with arXiv:2203.13934. text overlap with
arXiv:2309.0180